Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DuckDBAppender: Add enum support #210

Merged
merged 10 commits into from
Aug 19, 2024
Merged

Conversation

eschgi
Copy link
Contributor

@eschgi eschgi commented Aug 17, 2024

Hello,

the DuckDBAppender class now supports adding enum values.

I want to get some feedback from you @Giorgi, if the code is ok and if i should improve the error handling in the EnumVectorDataWriter class?

I can also add additional tests if needed.

Issue: #209

Kind regards
Stefan

@coveralls
Copy link

coveralls commented Aug 17, 2024

Pull Request Test Coverage Report for Build 10453973952

Details

  • 48 of 60 (80.0%) changed or added relevant lines in 5 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.4%) to 89.815%

Changes Missing Coverage Covered Lines Changed/Added Lines %
DuckDB.NET.Data/Internal/Writer/ListVectorDataWriter.cs 6 7 85.71%
DuckDB.NET.Data/Internal/Writer/VectorDataWriterBase.cs 2 3 66.67%
DuckDB.NET.Data/Internal/Writer/EnumVectorDataWriter.cs 38 48 79.17%
Files with Coverage Reduction New Missed Lines %
DuckDB.NET.Data/Internal/Writer/VectorDataWriterBase.cs 1 81.67%
Totals Coverage Status
Change from base Build 10090837361: -0.4%
Covered Lines: 1757
Relevant Lines: 1926

💛 - Coveralls

@Giorgi
Copy link
Owner

Giorgi commented Aug 17, 2024

Thanks for the PR, looks good!

Some comments:

  • What happens if you try to insert an Enum with a value that is outside of the DuckDB enum? Does it crash the app with Access Violation or does it return some other error?
  • I'm not sure if we should support writing strings to an Enum column. The caller can always parse it to a .Net Enum and pass that and we will be able to get rid of enumerating enum values. We can add support for strings later if there is a need.
  • Can you add a test that writes NULL to the Enum column to verify that NULL is written correctly?
  • Can you add a test that writes List of Enums to the database? It should work out of the box but would be nice to have a test for it.

@eschgi
Copy link
Contributor Author

eschgi commented Aug 18, 2024

Hi @Giorgi,

thanks for the feedback.

I will add the missing tests and also check what happens if an enum value is out of range.

Regarding your second point:

It is currently possible with the latest released version of DuckDB.NET to use string parameters to set enum fields. It would make sense to me if the DuckDBAppender class also supports this. At my work, we may be using DuckDB.NET in the future, so this feature would help me. I will improve the performance so that developers who don't need this don't have a disadvantage. Also other libraries like Npgsql support the same feature (https://www.npgsql.org/doc/types/enums_and_composites.html#reading-and-writing-unmapped-enums).

Here is my example code where I use string parameters to set an enum field:

using System.Data;
using DuckDB.NET.Data;

File.Delete("file.db");

using var duckDBConnection = new DuckDBConnection("Data Source=file.db");
duckDBConnection.Open();

using var command = duckDBConnection.CreateCommand();

command.CommandText = "CREATE TYPE test_enum AS ENUM ('test1', 'test2', 'test3')";
command.ExecuteNonQuery();

command.CommandText = "CREATE TABLE test (a test_enum not null);";
command.ExecuteNonQuery();

command.CommandText = "INSERT INTO test (a) VALUES ($a);";
foreach (var enumValue in new string[] { "test1", "test2", "test3" })
{
    command.Parameters.Clear();
    command.Parameters.Add(new DuckDBParameter("a", enumValue));
    command.ExecuteNonQuery();
}

command.CommandText = "SELECT a FROM test;";
var reader = command.ExecuteReader();
while (reader.Read())
{
    string enumValue = reader.GetString("a");
    Console.WriteLine(enumValue);
}

@Giorgi With the above explanation, is it ok for you if we support writing strings to an Enum column?

Kind regards
Stefan

@eschgi
Copy link
Contributor Author

eschgi commented Aug 18, 2024

Hi @Giorgi,

something related.

As far as I understand it, in DuckDB.NET the enum values are mapped via the numeric value and not according to the enum value names.

In the Npgsql library, for example, the enum value name is used for the mapping. I would prefer such an approach. See the following example code:

using System.Data;
using Npgsql;
using NpgsqlTypes;

var connectionString = "Host=127.0.0.1;Username=postgres;Password=xxx;Database=test_npgsql";

var dataSourceBuilder = new NpgsqlDataSourceBuilder(connectionString);
dataSourceBuilder.MapEnum<TestEnum>();
var dataSource = dataSourceBuilder.Build();

var npgsqlConnection = dataSource.OpenConnection();

using var command = npgsqlConnection.CreateCommand();

command.CommandText = "DROP TABLE IF EXISTS test;";
command.ExecuteNonQuery();

command.CommandText = "DROP TYPE IF EXISTS test_enum;";
command.ExecuteNonQuery();

command.CommandText = "CREATE TYPE test_enum AS ENUM ('test1', 'test2', 'test3')";
command.ExecuteNonQuery();

command.CommandText = "CREATE TABLE test (a test_enum not null);";
command.ExecuteNonQuery();

npgsqlConnection.ReloadTypes();

command.CommandText = "INSERT INTO test (a) VALUES (:a);";
foreach (var enumValue in new string[] { "test1", "test2", "test3" })
{
    command.Parameters.Clear();
    command.Parameters.Add(new NpgsqlParameter("a", enumValue) { DataTypeName = "test_enum" });
    command.ExecuteNonQuery();
}

command.CommandText = "SELECT a FROM test ORDER BY a ASC;";
var reader = command.ExecuteReader();
while (reader.Read())
{
    TestEnum enumValue = reader.GetFieldValue<TestEnum>("a");
    Console.WriteLine(enumValue);
}

public enum TestEnum
{
    [PgName("test3")]
    Test_3,
    [PgName("test2")]
    Test_2,
    [PgName("test1")]
    Test_1,
}

@Giorgi Do we want to change this?

Kind regards
Stefan

@Giorgi
Copy link
Owner

Giorgi commented Aug 18, 2024

Ok, let's keep strings too. In such case can you move enumValueIndexDictionary initialization from ctor to the AppendString method? But make sure to initialize it only once.

Also, can you update the test to read the Enum that was appended as string in the Enum type and vice-versa to test all the combinations?

Thanks!

@Giorgi
Copy link
Owner

Giorgi commented Aug 18, 2024

Hi @Giorgi,

something related.

As far as I understand it, in DuckDB.NET the enum values are mapped via the numeric value and not according to the enum value names.

In the Npgsql library, for example, the enum value name is used for the mapping. I would prefer such an approach. See the following example code:

using System.Data;
using Npgsql;
using NpgsqlTypes;

var connectionString = "Host=127.0.0.1;Username=postgres;Password=xxx;Database=test_npgsql";

var dataSourceBuilder = new NpgsqlDataSourceBuilder(connectionString);
dataSourceBuilder.MapEnum<TestEnum>();
var dataSource = dataSourceBuilder.Build();

var npgsqlConnection = dataSource.OpenConnection();

using var command = npgsqlConnection.CreateCommand();

command.CommandText = "DROP TABLE IF EXISTS test;";
command.ExecuteNonQuery();

command.CommandText = "DROP TYPE IF EXISTS test_enum;";
command.ExecuteNonQuery();

command.CommandText = "CREATE TYPE test_enum AS ENUM ('test1', 'test2', 'test3')";
command.ExecuteNonQuery();

command.CommandText = "CREATE TABLE test (a test_enum not null);";
command.ExecuteNonQuery();

npgsqlConnection.ReloadTypes();

command.CommandText = "INSERT INTO test (a) VALUES (:a);";
foreach (var enumValue in new string[] { "test1", "test2", "test3" })
{
    command.Parameters.Clear();
    command.Parameters.Add(new NpgsqlParameter("a", enumValue) { DataTypeName = "test_enum" });
    command.ExecuteNonQuery();
}

command.CommandText = "SELECT a FROM test ORDER BY a ASC;";
var reader = command.ExecuteReader();
while (reader.Read())
{
    TestEnum enumValue = reader.GetFieldValue<TestEnum>("a");
    Console.WriteLine(enumValue);
}

public enum TestEnum
{
    [PgName("test3")]
    Test_3,
    [PgName("test2")]
    Test_2,
    [PgName("test1")]
    Test_1,
}

@Giorgi Do we want to change this?

Kind regards
Stefan

I think this won't work with Appender because when appending string as Enum we don't know the underlying C# Enum.

@eschgi
Copy link
Contributor Author

eschgi commented Aug 18, 2024

Hi @Giorgi,

i think i could get the mapping working, my question is more, if mapping enums by name would be also a better approach for you? :) In my opinion, this approach is less error-prone.

@Giorgi
Copy link
Owner

Giorgi commented Aug 18, 2024

Let's keep it as it is for now.

@eschgi
Copy link
Contributor Author

eschgi commented Aug 18, 2024

Ok, I'll make the necessary changes as discussed.

@eschgi
Copy link
Contributor Author

eschgi commented Aug 19, 2024

Hi @Giorgi,

I implemented all the discussed changes, except the lazy initialization of the enumValueIndexDictionary / enumValues dictionary.

During writing the list tests I noticed that the logical type handle is released earlier than expected, so a lazy initialization is currently not so easy to implement (at least for me).

Can you please review the PR again?

Kind regards
Stefan

@eschgi eschgi marked this pull request as ready for review August 19, 2024 13:15
@Giorgi
Copy link
Owner

Giorgi commented Aug 19, 2024

I'll review it later today. Why did you need to add WriteItemsFallback ?

@eschgi
Copy link
Contributor Author

eschgi commented Aug 19, 2024

I'll review it later today. Why did you need to add WriteItemsFallback ?

The cast to IEnumerable<object> does not work when you have a list with enums.

@Giorgi Giorgi changed the base branch from develop to enum-appender August 19, 2024 19:24
@Giorgi Giorgi merged commit 7c18505 into Giorgi:enum-appender Aug 19, 2024
5 of 10 checks passed
@eschgi eschgi deleted the enum-support branch August 20, 2024 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants